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Abstract 

Despite the pleiotropic effects of the progesterone receptor in breast cancer, the molecular nnechanisms in play remain 
largely unknown. To gain a global view of the PR-orchestrated networks, we used next-generation sequencing to determine 
the progestin-regulated transcriptome in T47D breast cancer cells. We Identify a large number of PR target genes involved 
in critical cellular programs, such as regulation of transcription, apoptosis, cell motion and angiogenesis. Integration of the 
transcriptomic data with the PR-binding profiling of hormonally treated cells identifies numerous components of the small- 
GTPases signaling pathways as direct PR targets. Progestin-induced deregulation of the small GTPases may contribute to the 
PR's role in mammary tumorigenesis. Transcript expression analysis reveals significant expression changes of specific 
transcript variants in response to the extracellular hormonal stimulus. Using the NET! gene as an example, we show that the 
PR can dictate alternative promoter usage leading to the upregulation of an isoform that may play a role in metastatic 
breast cancer. Future studies should aim to characterize these selectively regulated variants and evaluate their clinical utility 
in prognosis and targeted therapy of hormonally responsive breast tumors. 
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Introduction 

Progesterone is a steroid hormone that plays a pivotal role in 
female physiology by co-coordinating diverse aspects of the 
reproductive system [1] and regulating mammary gland morpho- 
genesis [2] . It exerts its actions through the two isoforms of the 
progesterone receptor (PR-A and PR-B), which are ligand- 
activated transcription factors that belong to the nuclear receptor 
super-family. Deregulation of progesterone signaling is implicated 
in the development and progression of cancer in the hormone's 
target tissues [3]. In breast cancer the role of PR is well 
documented both in vivo and in vitro [4]. Experiments in PR 
knock-out mice demonstrated that progestins promote mammary 
tumor progression and growth [2,5,6]. Two large clinical studies in 
women [7,8] have also provided supporting evidence for a 
tumorigenic role of progesterone in the mammary tissue. In vitro 
studies have confirmed that progestin treatment affects important 
cellular programs, such as proliferation, apoptosis and differenti- 
ation [3,9], all of which have the potential to lead to a malignant 
phenotype when deregulated. 

To develop effective therapeutic schemes against PR signaling 
in breast cancer, a major requirement is the determination of the 
full repertoire of progestin-regulated genes in target cells. Gene 
expression microarray studies have been useful in characterizing 
transcriptional effects of progestin signaling [10,11,12,13,14,15]. 
However, this approach is lacking due to high levels of noise, 
relatively low sensitivity and limited number of array probes 



suggesting that a plethora of PR-regulated genes may still remain 
undetected. 

More than 90% of human genes can generate multiple 
transcript variants, which are, oftentimes, designated with tissue- 
or developmental- specific functional roles [16]. A growing 
number of studies have demonstrated the expression of cancer- 
associated variants participating in specific cellular programs, 
including apoptosis, cell growth, angiogenesis and cell motility, 
during tumor initiation and progression [17,18]. Detection and 
characterization of such variants can improve our understanding 
of the molecular mechanisms in play; it can also have significant 
impact in the clinic, since they have emerged as a promising tool 
for the diagnosis and management of the disease [19,20]. Studies 
using exoii-specific microarrays have identified estrogen-regulated 
transcript variants in breast cancer cell lines [2 1,22] . However, it is 
currently unknown to what extent progestin-regulated transcript 
variants contribute to the expression profile of breast cancer cells. 
The gene expression microarrays studies described above could 
not discriminate between variants and the reports of up- or down- 
regulation of mRNA expression levels are confounded by the 
effects of mixtures of these transcripts [19]. 

To address the above issues, we employed paired-end, next- 
generation sequencing (NGS) to interrogate the transcriptome of 
vehicle- and progestin- treated T47D breast cancer cells in an 
unbiased way. We identified hundreds of PR regulated genes that 
participate in important cellular processes, including apoptosis and 
transcription as reported before [14], but also angiogenesis and 
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cell migration. More importantly, we identified a novel group of 
PR targets that are involved in smaU-GTPases signaling. By 
employing ChlP-seq experiments for the PR, we showed that 
many of these genes were under direct progestin transcriptional 
regulation via the receptor's binding in their promoters or distal 
enhancer elements. SmaU-GTPases signaling pathways are widely 
implicated in normal physiology and disease [23]. According to 
our data some of them may be subject to PR-regulation and may 
be mediating the receptor's effects in breast cancer cells. On the 
transcript level we find that the receptor can dictate alternative 
promoter use decisions leading to significant c'xpression changes of 
specific transcript variants as a response to the external hormonal 
stimulus. Transcript variants often encode for unique protein 
isoforms with different, even antagonizing, functions [24]; 
consequentiy, expression data on the transcript level are necessary 
to paint an accurate picture of the PR-regulated proteome. 

In overall, our findings provide new insights into the molecular 
mechanisms of PR signaling and the progestin-regulated tran- 
scriptome of breast cancer cells. 

Materials and Methods 

Cell culture and Reagents 

T47D cells were purchased from ATCC and were grown in 
RPMI supplemented with 10% fetal calf serum and glutamine (Life 
Technologies) at 37°C under 5% CO2. Before hormonal treatment 
(10 nM R5020) cells were plated in RPMI/charcoal-stripped fetal 
bovine serum either 24 hours (RNA isolation) or 3 days (ChIP 
assays) before harvesting. The a-PR antibody (catalog no. sc-7208) 
was purchased from Santa Cruz Biotechnology, Inc. (Santa Cruz, 
CA), the a-PolII was from Covance (catalog no. MMS-126R) and 
the antibody against mono/di/trimethyl-Histone H3 (Lys4) was 
from Millipore (clone AW304 catalog no 04-791). 

Reverse Transcription qPCR 

Total RNA was extracted from cells using the Trizol reagent 
(Life Technologies), and 2.5 ng of RNA were reverse transcribed 
using the RevertAid First-Strand cDNA Synthesis System 
(ThermoCcientific) according to the manufacturers' instructions. 
cDNA was amplified by quantitative PGR (qPCR) using the SYBR 
FAST Universal 2X qPCR Master Mix (KAPA Biosystems). All 
experiments were performed in at least 3 biological replicates. 
Statistical analysis was performed by Student's Hest (compared 
with vehicle treatment). Primer sequences used for RT-qPCR are 
available upon request. 

Chromatin immunoprecipitation assays 

ChlPs were performed as described before [25]. Briefly, T47D 
cells were grown for 3 days in RPMI/charcoal-treated fetal calf 
serum and then treated with 10 nm R5020 for 1 hr. Fixation with 
1% formaldehyde proceeded for 10 min at 37°C and was stopped 
by the addition of glycine to a final concentration of 0.125 M. 
Cells were harvested, resuspended in lysis bufier (50 mM Tris-HCl 
pH 8, 10 mm EDTA, 1% sodium dodecyl sulfate) and fragmented 
chromatin (500-1000 bp) was generated through sonication 
(Misonix sonicator). Samples were diluted in ChIP dilution buffer 
(1.2 mMEDTA; 167 mM NaCl; 16.7 mM Tris-HCl, pH 8; 1.1% 
Triton X-100; and 0.01% SDS), precleared for 2 hr at 4°C with 
protein A-hG magnetic beads (Life Technologies) and used for the 
Chip assays with the addition of 5 ng of antibody. The next day 
the recovered immunocomplexes were washed with the following 
bufiers: washing bufier I (2 mM EDTA; 20 mM Tris-HCl, 
pH 8.0; 0.1% SDS; 1% Triton X-100; 150 mM NaCl), washing 
bufier II (2 mM EDTA; 20 mM Tris-HCl, pH 8.0; 0.1% SDS; 



1% Triton X-100; 500 mM NaCl), washing buffer III (1 mM 
EDTA; 10 mM Tris-HCl, pH 8.0; 1% Nonidet P-40; 1% 
Deoxycholate; 0.25 mM LiCl) and TE (1 mM EDTA, 10 mM 
Tris-HCl, pH 8.0). After elution, cross-linking was reversed by an 
overnight incubation at 65°C, samples were incubated with 
proteinase K and DNA was extracted with phenol-chloroform and 
EtOH precipitation. Samples were analyzed by qPCR and their 
enrichment over input was calculated by the 2"^^*^' method after 
correcting for the IgG negative controls. All experiments were 
performed in 2-4 biological replicates. Statistical analysis was 
performed by Student's i-test. Primer sequences used for ChlP- 
qPCR are available upon request. 

Library preparation for next-generation sequencing 
experiments 

For RNA-sequencing, total RNA was extracted from T47D 
cells using the Trizol reagent (Life Technologies) and was treated 
with Turbo DNase I (Ambion) for 30 min at 37°C. poly(A)"" RNA 
was isolated using oligo(dT)-conjugated magnetic beads (FastTrack 
MAG mRNA Isolation Kit, Life Technologies). Library prepara- 
tion was performed with the ScriptSeq kit (Epicentre) according to 
the manufacturer's instructions. Briefly, poly(A)"^ RNA was 
fragmented at 90°C for 6 min and was, subsequently, subjected 
to cDNA synthesis. cDNA was tagged at the 3' end, purified using 
the Agencourt AMPure XP system (BeckmanCoulter), and it was 
then converted to double-strand cDNA. This product was PCR- 
amplified for 1 1 cycles; during this step completion of the addition 
of the Illumina adaptor sequences and incorporation of an index 
(ScriptSeq Index PGR Primers, Epicentre) was performed. The 
PGR product was treated with Exo I for 15 min at 37°C and was 
purified as described above. 

For ChlP-sequencing, DNA was immunoprecipitated from ~20 
million cells, grown and treated as described above, and then it 
was purified and sonicated to ~400 bp fragments using the 
Bioruptor (Diagenode). Fragmented DNA was used for library 
construction using the NEBNext ChlP-seq sample Prep Master 
Mix Set 1 (New England Biolabs) and following the manufactur- 
er's instructions. Briefly, this product underwent end repair, cLA- 
tailing and adaptor Kgation using the lUumina specific adaptors. 
In-between enzymatic steps the samples were purified using 
Agencourt AMPure XP system (BeckmanCoulter). The libraries 
were PCR-amplified for 11—14 cycles using Phusion HotStart 
DNA polymerase. 

One nl of each library was run on the Bioanalyzer to assess 
library quantity and quality. Libraries were run on Illumina HiSeq 
2000 using 50-bp paired-end sequencing and following standard 
protocols. 

Next-generation sequencing data analysis and 
differentially expressed gene (DEG) testing 

50-bp paired-end sequencing was performed for PR-immuno- 
precipitated DNA from R5020-treated c:ells, for genomic input 
DNA and for mRNA isolated from progestin- and vehicle- treat(;d 
cells. The FASTQC package (http://www.bioinformatics. 
babraham.ac.uk /projects/fastqc/) was used to assess quality of 
reads. When necessary, Trimmomatic v.0.3 [26] was used to trim 
reads by applying the parameters LEADING: 3 TRAILING: 3 
SLIDINGWINDOW:4: 15 MINLEN:36. 

For ChlP-seq data analysis, reads were mapped to the human 
genome (hgl9) using Bowtie v.1.1.2 (default parameters, hgl9) 
[27]. SAM tools [28] were used to select for high quality, uniquely 
mapped, paired-end reads leading to 14,148,482 total reads. 
MACS [29] was used for peak calling with default parameters (fold 
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enrichment &10 and p-value£ 10"') using an equal number of 
reads from input DNA as a control. For RNA-seq data analysis, 
after quality filtering, reads were aligned to the UCSC hgl9 
reference genome using TopHat2 allowing up to two mismatches 
and discarding reads mapping at multiple locations [30] . This led 
to the generation of 76,1 18,132 and 73,174,398 unique paired-end 
reads for EtOH- and R5020- treated cells respectively. Transcript 
abundance was quantified using Cufilinks 1.2.1. [31] and the 
iGenomes Ensembl GTF annotation file as a reference. The 
normalized expression level of each transcript was measured by 
FPKM (Fragments Per Kilobase of transcript per Million mapped 
reads). A threshold of 10 mapped reads was used to define 
detection at the gene level. 

For subsequent analyses we considered the information both at 
the gene and transcript level. Differentially expressed genes and 
transcripts were called using Cuffdifl" 2 that utilizes Student's t-test 
to determine if two datasets are different from each other [32]. By 
default, CuflfdiflF 2 generates the lowest p-value to be 5x10 ^. 
Genes and transcripts showing an FPKM equal or higher than one 
at least under one condition (progestin- or vehicle- treated) were 
retained for further analysis. A threshold of 1.5 was applied on the 
fold change. Given that the RNA-Seq DEG algorithms generally 
result in much higher adjusted p-values (0.03~0.12) than their 
microarray counterparts (<0.01) [33], and based on the fact that 
several previously identified PR-regulated genes were listed in our 
data with p-values up to 0.38, we decided to test higher p-values as 
cut-offs for the identification of PR-regulated genes. Extensive 
validation of our data by RT-qPCR led us to finally use a p-value 
cut-off of 0.15. Using higher p-values led to an increase in the 
number of false positives. For differentially expressed transcripts 
the p-value cut-off was £0.05. Further manipulation of the data 
was done with in-house scripts. All datasets have been deposited to 
GEO (accession number GSE51428). 

Gene Ontology analysis 

For functional enrichment analysis of the differentially ex- 
pressed genes (DEGs) the module FatiGO [34] of the Babelomics 
bioinformatics suite [35] and the DAVID functional annotation 
tools [36] were used. Both algorithms use Fischer's exact test to 
check for significant over-representation of Gene Ontology (GO) 
annotations, but differ to the gene reference background they use. 
FatiGO compares DEGs with respect to the whole human 
genome, while DAVID is more conservative and uses genes 
associated with terms in the corresponding annotation categories 
as the reference background. 

Results and Discussion 

Characterization of the RNA-sequencing data 

In ()rd(-r to drt(;rmine the PR-regulat(;d transcriptome in the 
breast cancxr milieu, mRNA was isolated from T47D cells and 
was subsequently subjected to 50-bp paired-end serjuencing. 
Reads were aUgned against the UCSC hgl9 reference genome 
using TopHat 2 [30] and transcript assembly of the afigned reads 
was performed using Cufflinks [31] and the Refseq database for 
reference gene annotation (see Materials and Methods). 

To evaluate our data, we performed an initial analysis on the 
gene level. The global profiles of gene expression between the two 
samples were highly correlated with the Pearson correlation 
coefficient being 0.97 (Figure lA). Among the top 100 most 
highly expressed genes (data not shown), we identified the 
expected housekeeping ones (e.g. GAPDH, PPIA and TUBAIB) 
and several genes associated with the healthy (e.g. PRLR), or the 
neoplastic mammary tissue (e.g. PIP, KRT19 and MUCl) 



[37,38,39]. Also included in this list there were members of the 
heat shock protein 90 family [HSP9()-AA1, -ABl and -Bl), known 
molecular chaperones of SHRs [40], and several proto-oncogenes, 
such as AGR2 [41], RACl [42] and GjNAS [43]. Notably, we also 
found very highly expressed the guanine nucleotide binding 
protein (G protein), beta polypeptide 2-like 1 {GJVB2L}, calnexin 
{CANX), calreticuUn {CALR) and beta-2 microglobuhn {B2M}; these 
are all genes associated with the breast cancer phenotype and they 
were among the most highly expressed in all breast tumors assayed 
by RNA-sequencing in a recent study [44] . 

In overall, the above findings validate our RNA-sequencing 
data including the transcript assembly and the calculation of 
transcript abundance as FPKM values. 

Identification and validation of differentially expressed 
genes 

Gene expression microarray experiments are commonly per- 
formed after 6-24 hr of progestin treatment and inevitably identify 
not only primary but also secondary- PR targets. The greatiy 
enhanced sensitivity and accuracy delivered by deep-sequencing 
allowed us to use a shorter treatment period (3 hr) to enrich the 
differentially regulated genes we identify for primary PR targets. 
The transcriptomes of the progestin- and vehicle- treated cells 
were compared using CufFdifF 2 [32], a differentially expressed 
genes algorithm. In total, we identified 1287 DEGs and a detailed 
list of them is provided in Table SI. The log2(fold change) of the 
ratio of EtOH-treated to R5020-treated is given. We set two 
stringency thresholds to classify all DEGs (see Materials and 
Methods). The high stringency group included 711 genes and the 
low stringency group included 576 genes (Figure IB). The 
majority of the PR targets were up-regulated after progestin 
treatment (896 genes), while 391 were down-regulated 
(Figure IC). 

To validate our RNA-seq experiments and data analysis, the 
expression levels of several DEGs from both stringency groups 
were assayed by RT-qPCR and a side-by-side comparison of these 
results with the RNA-seq data is shown in Figure 2. All genes 
examined were found to be progestin-regulated in agreement with 
the RNA-seq data. In Figure 2A, we present the results from the 
high stringency genes. We examined a larger number of genes 
from the low stringency group (Figure 2B) to ensure that we did 
not include false positives in our analysis. We observed from our 
RNA-seq data analysis that most genes in the low stringency group 
were downregulated (Figure IC), thus we selected to examine an 
equal number of induced and repressed genes from this group 
(Figure 2B). For two of these genes, ABCC3 and PYCARD, the 
degree of downregulation was not accurately measured by RNA- 
seq, but in overall, changes in expression levels detected by the two 
techniques were very similar for most genes tested providing 
additional support for the accuracy of the RNA-seq data. 

We also compared our dataset with several other previously 
published datasets of PR-regulated genes generated by microarray 
expression experiments in T47D cells [10,11,12,13,14,1,5]. We 
found that both stringency groups contained previously reported 
PR-regulated genes (350 out of the 1287) (Table SI). 

The above data, taken together, further confirm the vahdity of 
our RNA-seq data and analysis and lead to the identification of 
hundreds of new PR targets. 

Early progestin-induced gene expression changes in 
breast cancer cells 

To gain some insight into the biological significance of our data, 
we performed gene ontology analysis for all DEGs using the tools 
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Figure 1. Identification of PR-regulated genes by RNA-sequencing. (A) Scatter plot of global gene expression in the EtOH- and R5020- 
treated T47D cells. In total, 10,997 and 10,930 genes (FPKIVI>1) are expressed respectively. The high Pearson correlation coefficient (r = 0.966) 
indicates similar expression profiles between the two samples as expected. (B) Volcano plot (p-value vs. fold change of expression) for all differentially 
expressed genes (DEGs) between vehicle- and progestin- treated cells. To determine DEGs, a threshold of 1 .5-fold change was set and p-value cut-offs 
were 0.05 and 0.15 for the high (dark blue) and low (light blue) stringency groups respectively. (C) A total of 711 up-regulated and 576 down- 
regulated genes were identified and classified to two stringency groups, as described above. 
doi:1 0.1 371 /journal.pone.0098404.g001 



FatiGO [34] and DAVID [36]; both algorithms yielded similar 
results. GO analysis was organized around the three basic 
"principles" of biological process, molecular function and cellular 
component (Figure 3). 

Key biological processes associated with breast cancer initiation 
and progression, such as regulation of transcription, apoptosis and 
cell proliferation, were found to be highly enriched among PR 
targets (Figure 3A). The DEGs involved in regulation of cell 
death are listed in File SI, table A. Interestingly, they include an, 
approximately, equal number of genes that promote or suppress 
apoptosis (Figure SI) in accordance with the dual role of PR in 
cell proliferation [45] . Right after progestin treatment T47D cells 
go through a proliferative phase followed by a second phase of 
growth inhibition [45]. Our data show that early transcriptional 
responses are not restricted to cell proliferative genes, but also 
include induction of genes with apoptotic effects. This suggests that 
the PR initiates simultaneously a proliferative and an anti- 
proliferative program and the decision which one will prevail, 
probably, depends upon the specific cell context. These findings fit 
with the hypothesis by Lange et al that progesterone acts as a 
priming agent that induces cellular changes that permit other 



factors to influence the ultimate proliferative or diflferentiative state 
of the cells [46]. 

The most numerous target group (258 out of 1200 genes 
analyzed by DAVID) is comprised of genes playing a role in 
regulation of transcription (File SI, table B). It includes members 
of principal families of transcription factors, such as the GATA, 
FOX and E2F families that control a wide spectrum of biological 
functions. Deregulation of the expression of these factors has a 
crucial role in the development and progression of cancer 
[47,48,49]. Another example is the family of the Kriippel-like 
factors (KLF) with most of its members being under progestin 
regulation in T47D cells (Table SI). Our data confirm previous 
findings that several KLFs are PR targets (see Table SI) and they 
add to this list KLFs 3, 6, and 7 (Table SI and Figure 2B). 
Recent studies have documented KLF family members in the 
control of cell proliferation, differentiation, and apoptosis in 
steroid-responsive mammary and uterine endometrial cells [50]; 
some of them have been implicated in breast cancer progression 
[50]. It is reasonable to assume that cross-talk between KLF and 
PR signaling may partiy mediate the receptor's effects on these 
processes. Deregulation of PR signaling may lead to aberrant 




AB0C3 GLIS2 HEXIM2 KLF3 KLF7 LIMD1 NR3C2 PMAIP1 PYCARD TMEM173 



Figure 2. Differentially expressed genes detected by RNA-seq are validated by RT-qPCR. T47D cells were treated for 3 hr with R5020 or 
EtOH. RNA was extracted and used for RT-qPCR using specific primers for the genes shown. Expression levels were normalized to GAPDH. A side-by- 
side comparison of the RT-qPCR results with the RNA-seq data is shown. (A) Expression levels of high stringency genes as measured by RT-qPCR and 
RNA-seq. F0XA1 was the only selected down-regulated gene and its expression pattern was confirmed. (B) Expression levels of low stringency genes 
as measured by RT-qPCR and RNA-seq. Five downregulated (ABCC3, GLIS2, HEXIM2, KLFS and PYCARD) and five upregulated genes were selected and 
confirmed by RT-qPCR. Error bars represent the SEM. 
doi:1 0.1 371/journal.pone.0098404.g002 
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Figure 3. Gene ontology analysis for PR target genes. (A) Highly enriched biological processes were determined by using FatiGO. All GO 
annotations related to the same biological process are shown in the same color. Grey bars represent other processes. (B) Highly enriched molecular 
functions were determined by using FatiGO and the top fifteen are presented here. Results are clustered by function and the percentage of DEGs that 
were associated with each GO annotation is shown. (C) Eight of the top ten most highly enriched GO annotations for cellular component (as 
determined by FatiGO) fall into 3 broad categories: plasma membrane, cell junction and Golgi apparatus. Percentage of DEGs that were identified in 
each category is shown. 
doi:1 0.1 371/journal.pone.0098404.g003 



expression of KLFs and loss of control over these critical events in 
breast cancer. 

In accordance with the above analysis, we, also, find that the 
most highly enriched molecular functions in the DEGs are related 
to transcription factor activity (blue sectors in Figure 3B), 
supporting the notion that the PR initiates a new transcriptional 
program in the cell as a response to the external stimulus. 

Other biological pathways that are overrepresented are related 
to cell motility and angiogenesis (Figure 3A), which are key steps 
in cancer cell growth and metastasis. Shortly after progestin 
treatment the expression status of several genes involved in cell 
motion (File SI, table C) is affected substantiating previous 
reports of progestin-induced breast cancer cell migration and 
invasion [51,52]. Moreover, a significant number of PR targets are 
components of cell junctions (cell to cell and cell to extracellular 
matrix) (Figure 3C) confirming that the receptor plays an 
important role in regulating cell motion. Our data also offer 
new insight into the role of PR in angiogenesis by identifying 
dozens of target genes involved in this process (File SI, table D). 
VEGFA, a potent angiogenic factor that stimulates breast cancer 
cell growth in vitro and in vim [53] and THBSl, an inhibitor of 
angiogenesis that promotes tumor progression and metastasis [54] , 



had been reported before as PR-regi.ilated genes [55,56,57,58]. 
We find that other well-known angiogenic factors, such as ACVRl 
and EDMl are, also, under progestin control (File SI, table D). 
These data strongly suggest that PR-orchestrated networks are 
involved in cell motility and vasculature development in breast 
cancer cells. It is plausible that aberrant PR signaling in breast 
tumors may be a contributing factor to tumor vascularization and 
metastasis. 

Notably, the same functional categories are also enriched when 
examining the high stringency group alone, confirming the validity 
of our differential gene analysis (data not shown). 

Components of the small-GTPase signaling patliways are 
direct PR targets 

The GO analysis described above revealed that a significant 
fraction of PR targets is involved in "smaU-GTPases mediated 
signal transduction" (Figure 3A), most likely functioning as 
"GTPase regulators" (Figure 3B). This was a novel finding that 
raised our interest, since deregulation of small-GTPases signaling 
may play a role in cancer progression [23]. The small GTPases are 
small G-proteins that can bind and hydrolyze GTP, cycling in this 
way between an inactive (GDP-bound) and an active (GTP-bound) 
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state. Their activity is regulated by GTPase activating proteins 
(GAPs) and guanine nucleotide exchange factors (GEFs). They are 
localized to multiple membrane compartments including the 
plasma membrane and the Golgi apparatus and this is, probably, 
the reason behind the significant number of PR targets associated 
with these cellular components (Figure 3C). 

Functional annotation using FaTIGO [34] and DAVID [36] 
and manual inspection of all DEGs identified 74 genes in totcJ 
involved in small-GTPases signal transduction. The vast majority 
of them (59 of 74) are small-GTPas(-s that belong to the Ras 
superfamUy or they are regulators of these enzymes (Table 1). 
Several of these genes were experimentally vahdated by RT-qPCR 
(Figure 4A). Three hours after progestin treatment all genes 
assayed were significantiy up-regulated, except RERG and VAV3 
that were repressed in agreement with the RNA-seq data (Table 
SI). 

The above data demonstrate that several small-GTPases, GAPs 
and GEFs are early (immediate) PR targets. Next, we asked 
whether these genes were, also, direct PR targets, where the 
receptor directly controlled their expression by binding to 
regulatory sequences. To this end we performed ChIP experi- 
ments with an antibody against the PR followed by paired-end 
NGS to accurately map the receptor's binding sites. In overall, we 
identified 11,180 PR binding sites in treated T47D cells. 
Validation of these data is shown in Figure S2. FKBP5 is a 
well-known PR target and various PR binding sites have been 
reported in a distant intron of the human locus and in the first 
intron of the mouse one [2.5, .59]. Our ClilP-seq experiments 
identified a number of new PR binding sites several kilobases 
upstream of the FKBP5 transcript variant 1 (NM_0041 1 7) (Figure 
S2A), which is the PR-regulated transcript (File S2, table 
A_DETs). These sites were all verified by ChlP-qPCR (Figure 
S2B). We also assayed by ChlP-qPCR several other PR binding 
sites associated with PR-regulated genes and they were all found to 
be enriched for receptor binding (Figure S2C). 

Examination of our ChlP-seq dataset indicated that the 
majority of small GTPases, GAPs and GEFs (34 out of 59) were 
indeed associated with a PR-binding site within 50 Kb from the 
gene locus. Since the Rho GEFs are overrepresented among the 
PR targets listed in Table 1, we selected three of them (METl, 
FGD4 and AKAP13) for further analysis. Our ChlP-seq data 
revealed a PR binding site in an intron of the NETl gene and in 
distal intergenic regions ~50 Kb up-stream of the transcription 
start sites of FGD4 and AKAP13 (Figure 4B). ChlP-qPCR 
experiments confirmed that these sites were enriched for PR 
binding after progestin treatment (Figure 4C). To further show 
that receptor recruitment on these sites is not a random event, but 
it has a functional role, we performed ChlPs for H3K4me, a mark 
for promoters and enhancers [60] . We found that these sites were 
enriched for this histone modification (Figure 4D) strongly 
arguing in favor of their regulatory role. 

Taken together tlu; above data suggest that the PR can 
modulate the expression of a large number of small-GTPases and 
associated regulators. Several of them appear to be direct PR 
targets as their mRNA levels are affected shortly after progestin 
treatment and following PR binding in regulatory sites. Most PR 
targets are Ras and Rho small-GTPases regulators (GAPs and 
GEFs) (Table 1). The Ras family members are involved in control 
of cell proliferation, while the Rho GTPases play an important 
role in cytoskeleton organization [23], and as such they are 
involved in cell adhesion, migration, proliferation, survival, 
differentiation and malignant transformation [61]. We confirmed 
that several Rho GEFs are under direct PR regulation 
(Figures 4A and 4C). It is plausible that, in the breast cancer 



milieu, the intensified progestin stimulus induces over-expression 
of Rho GEFs leading to aberrant activation of the cognate small- 
GTPases, which, in this way, mediate hormonal control over 
important biological processes. In agreement with this hypothesis, 
the most common cause of aberrant smaU-GTPase signaling in 
cancer is the altered expression or activation of their regulators 
[61]. 

Notably, it has been shown that non-genomic actions of the 

ligand-activated PR lead to activation of the RhoA/Rho- 
associated kinase (ROCK-2) cascade in T47D cells [52,62]. This 
signaling pathway, ultimately, directs remodeling of the actin 
cytoskeleton and formation of membrane ruffles required for cell 
movement [52]; it, also, leads to rapid activation of the focal 
adhesion (FA) kinase and increased formation of FA complexes, 
which provide anchoring sites for cell attachment to the 
extracellular matrix during cell movement and invasion [62]. 
During preparation of this manuscript a study came out that 
described the PR-targetome during mammary gland branching 
morphogenesis [63]. Interestingly enough, the authors, also, 
identified components of "smaU-GTPases mediated signal trans- 
duction" to be highly (-nric hed among the target genes [63] . They 
went on to show that progesterone activation of Rac (a Rho small- 
GTPase) signaling is necessary to induce side-brunching [63]. 
These findings are in agreement with our own data highlighting 
the small-GTPases signaling pathways as progestin-regulated 
networks in healthy and maUgnant mammary cells. 

PR dictates alternative promoter use decisions in breast 
cancer cells 

For the identification of differentially expressed transcripts 
(DETs) after progestin stimulation, we used CuffdifiT 2 [32] as 
described before. We set more stringent parameters for data 
analysis on the transcript level (^ 1.5-fold change of expression and 
p-value£0.05) leading to the identification of 1014 DETs (data not 
shown). To ensure the validity of our findings, we limited further 
analysis to the top 80 DETs (p-value<5xl0"^) (File S2, table 
A_DETs). Thirty of them were generated by genes that had 
multiple- transcripts in the RefSeq database [64] and they are listed 
in File S2, table B_annotated transcripts. Annotation of the 
transcript \ ariants (columns B and C in Suppl. file 3, sheet 2) was 
done according to the RefSeq database. The protein isoforms they 
encode are denoted as "canonical" according to the Uniprot 
database (column D). Information from both the RefSeq and the 
Uniprot databases was used in order to mark protein isoforms as 
"distinct" (different than the canonical) or "not distinct" (same as 
the canonical) (column D). Transcript variants generated by 
alternative splicing or alternate promoter usage relative to the 
canonical isoform are denoted as AS or AP respectively (column 
D). There is an equal number (12) of PR-regulated transcript 
variants that are generated either by AS or by AP, while 6 of them 
use both mechanisms (column F). Interestingly, the PR-regulated 
transcript variants of 26 out of the 29 genes (1 gene encodes for 
non-coding RNAs) encode for distinct protein isoforms (columns D 
and F), suggesting that progestin signaling may contribute 
significantly to the proteomic diversity of the cell. 

Eight {ARID5, GREBl, KAMKl, NETl, PFKB3, RCAJVl, TIPARP 
and T,SC22D3) of tlu-sc 30 gc'n(;s had two or more transcript 
variants expressed in vehicle-treated T47D cells (data not shown), 
however, only one of these variants was differentially regulated 
after progestin treatment (File S2, table A_DETs). To confirm 
these results, we examined the 3 genes [METl, KANKl and 
TSC22D3) whose transcript variants appeared to be expressed in, 
approximately, similar levels in vehicle-treated T47D cells 
(Table 2). We designed variant-specific primers and we 
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Table 1. Progestln-regulated small-GTPases, GAPs and GEFs. 



Ras family 



Small GTPases 


GAPS 


GEFs 


LRRK2 


RASA2 


PLCE 


RASLIOB 


RASA3 


RASGEFIA 


RERG 


RASA4 


RASGRPl 


RRAS2 


RASALl 


SOSl 


SIPAl 


Rho family 


Small GTPases 


GAPS 


GEFs 


RHOB 


ARHGAP17 


AKAP13 


RH0BTB2 


ARHGAP23 


ARHGEF2 


RHOU 


ARHGAP26 


ARHGEF26 


RHOV 


ARHGAP32 


ARHGEF37 


RNDl 


ARHGAG40 


FGD3 




ARHGAP42 


FGD4 




SRGAPl 


ITSNl 




SRGAP2 


NETl 




SRGAP3 


PLEKHG6 




SRGAP2B 


SPATA13 




STARD13 


VAV3 


Arf family 


Small GTPases 


GAPS 


GEFs 


ARF6 


ASAP2 


CYTHl 


ARL4C 




IQSECl 


ARL4D 




PSD4 


ARL5B 


Rab family 


Small GTPases 


GAPS 




RAB12 


TBC1D12 




RAB33B 


TBCl D20 




RAB4A 


TBC1D4 




RAB9B 


Rap family 


Small GTPases 


GAPS 


GEFs 


RAP2B 


RAP1GAP2 


RAPGEF6 



RAP2C 



doi:10.1371/journal.pone.0098404.t001 



performed a time-course RT-qPCR study; the results are shown in 
Figure 5. For all three genes, transcript variants 1 (herein called 
NETl.l, KANKl.l and TSC22D3.1) retained a, relatively, stable 
level of expression, while transcript variants 2 (herein called 
NET 1.2, KANK1.2 and TSC22D3.2) were strongly induced 
shordy after progestin treatment (Figures 5A— C). This is in 
absolute agreement with the RNA-seq data (Table 2). Given that 
an older study had found that both NETl transcripts were up- 
regulated after progestin treatment [22], we used two different sets 
of primers for the NETl transcript variant 1 (NETl.l) and we 
repeated the experiment in 3 biological rephcates; both primer sets 
gave identical results in all three experiments (Figure 5A). 

We further investigated the mechanisms that mediated this 
preferential transcript regulation. In all three genes, the alternative 
transcripts were generated by alternative promoters. In the case of 



KAMKl and NETl, our ChlP-seq experiments had revealed PR- 
binding sites in the first intron of KANK1.2 (not shown) and in the 
promoter of NETl. 2 (Figure 4B); these sites were confirmed by 
ChlP-qPCR experiments (Figures S2C and 4C respectively). 
For the TSC22D3 gene we did not find any PR-binding sites within 
a 50Kb distance. It is possible that our ChlP-seq experiments 
failed to capture such a site. It is also possible that the PR 
regulatory site is located in a greater distance or that the induction 
of TSC22D3.2 is not a direct transcriptional effect of the receptor. 
Since TSC22D3 is a known glucocorticoid-regulated gene with 
weU-characterized GREs [65] , we designed primers encompassing 
the GR-binding site and performed ChlP-qPCR for the PR. We 
did not find any enrichment for PR-binding in this region (data 
not shown). Subsequendy, we performed polll ChlP-qPCR 
experiments and we showed that polll recruitment on the 
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qRT-PCR for small-GTPase signaling components 



n n n n n 




n 



n 



B 



f.vS^ 



chrie; 5,485; eeel 

PR peaks 



,cP^ ,^0^ ^e^^ ^^^^ 

5,496, eeel 



5,495, eeel 



NETl 4^- 



NETl 

Chrl2! 32,689, ( 

PR peaks 

FGD4 

Chris I 
PR peaks I 
FIKFIP13 



HE- 



32,766,! 



85,956,1 



PRChlPs 




NETl 



FGD4 



AKAP13 



GAPDH 



= 0.3 



H3K4meChlPs 




NETl 



FGD4 



AKAP13 neg.control 



Figure 4. Components of small-GTPases signaling pathways are direct PR targets. (A) T47D cells, treated with R5020 or EtOH for 3 hrs, 
were used for RNA extraction. RT-qPCR was performed using specific primers for two small GTPases (ARF6, RERG) and several GEFs (AKAP13, FGD4, 
NETl, PSD4, VAV3) and GAPs {ASAP2, ARHGAP17, ARHGAP42, RAP1GAP2). Expression levels were normalized to GAPDH. Error bars indicate the SEM. (p- 
value<0.005, single asterisk indicates p-value<0.5 and double asterisk p-value<0.05) (B) Visualization of the ChlP-seq data on the UCSC browser. PR 
binding sites in an intron of the NETl gene and in distal intragenic regions of FGD4 and AKAP13 are depicted by the black blocks. (C) ChlP-qPCR 
experiments show increased PR binding in the three sites shown in (B) after 1 hr of progestin treatment (single asterisk indicates p-value<0.05, 
double asterisk p-value<0.01 and triple asterisk p-value<0.001 ). The promoter of GAPDH is used as a negative control. (D) ChlP-qPCR experiments for 
methylated H3K4. The PR binding sites shown in (B) are also enriched for this mark (p-value<0.05) of active enhancers and promoters. An intergenic 
region is used as a negative control. Error bars represent SE. 
doi:10.1371/journal.pone.0098404.g004 



Table 2. Expression levels (in FPKM) of transcript variants according to the RNA-seq data. 





Official gene symbol 


GenBank accession ID 


R5020 (FPKM) 


EtOH (FPKM) 


log2(fold_change) 


NETl 


NETl transcript variant 1 


NM_001047160 


10.0867 


8.66451 


-0.219266 


NETl transcript variant 2 


NM_005863 


118.522 


5.59903 


-4.40383 


KANK1 


KANK1 transcript variant 1 


NM_153186 


12.1742 


1 .73468 


-2.81108 


KANKl transcript variant 2 


NM_015158 


1 .87683 


2.07454 


0.14449 


TSC22D3 


TSC22D3 transcript variant 1 


NM_1 98057 


2.64031 


2.02448 


-0383154 


TSC22D3 transcript variant 2 


NM_004089 


38.2182 


4.95535 


-2.9472 



doi:1 0.1 371 /journal.pone.0098404.t002 
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Figure 5. PR regulation of specific transcript variants. T47D cells were treated for 0 to 12 hrs with R5020. RNA was extracted and used for RT- 
qPCR using transcript-specific primers. Expression levels were normalized to GAPDH. (A) Time course experiments for the NETl transcripts show 
strong induction of the NETl .2 variant after progestin treatment, while the NETl .1 variant retains a stable expression pattern. To confirm the NETl .1 
levels of expression, two different set of transcript-specific primers were used (in green and red) giving identical results. (B) Time course experiments 
for the two KANK1 transcripts confirm upregulation of the KANK1.2 after progestin treatment. (C) Time course experiments for the two TSC22D3 
transcript variants show that only the TSC22D3.2 variant is strongly upregulated after progestin treatment. (D) ChlP-qPCR experiments show 
significant increase in polll binding on the promoters of NETl .2, KANK1.2 and TSC22D3.2 after 1 hr of progestin treatment (ethanol-treated control is 
set to 1) (p-value<0.05). Error bars represent SE except in (A), where they represent the SEIVI. 
doi:1 0.1 371 /journal.pone.0098404.g005 



NETl. 2, KANK1.2 and TSC22D3.2 promoters was increased in 
response to progestin, but it was not, significantly, affected on the 
promoters of the non-progestin regulated variants (Figure 5D). 
Assumingly, PR recruits transcription co-factors that facilitate 
polll loading onto the regulated promoters. Communication of the 
PR complex with the non-regulated promoters may be prevented 
through binding of the insulator factor CTCF, as it has been 



suggested for the differential regulation of transcript variants by 
the estrogen receptor [22]. 

These findings are of particular importance, since, oftentimes, 
transcript variants of the same gene encode for different protein 
isoforms with diverse functions. A striking example is provided by 
the JVETl gene itself NETl is a RhoA specific GEE. RhoA is 
aberrantly expressed in many human cancers, including breast 
cancer, and its activation is essential for cancer cell migration and 
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invasion [66] . The two transcript variants generated by the NETl 
gene (Figure 4 B), a long one known as NETl (NET 1.1, 
NM_001047160) and a shorter one also known as NETIA 

(NET 1.2, NM_005863), encode for isoforms with potentially 
difTerent functions [22]. Studies have indicated that the NETl 
isoform is important for cell proliferation [22], while the- NETIA 
controls cytoskeleton reorganization and cell motility [22,66,67]. 
Specifically, NETIA is required for FAK activation, focal 
adhesion maturation and it is necessary for amoeboid extracellular 
matrix invasion of breast cancer cells [66] . It is, therefore, possible 
that NETIA plays an important role in metastatic breast cancer. 
According to our data PR induces over-expression specifically of 
the NETIA (NETl. 2) variant. It is tempting to speculate that 
increased levels of NETIA may lead to aberrant activation of 
RhoA and mediate breast cancer metastasis. 

The transcript variants generated by KANKl and TSC22D3, 
also, encode for distinct protein isoforms (File S2, table 
B_aiinotated transcripts) with, potentially, different functions. 
KANKl was first identified as a candidate tumor suppressor gene 
for renal cell carcinoma and it has several alternative promoters, 
by which different types of transcripts are generated from the same 
locus (reviewed in [68]). It encodes an ankyrin-repeat domain- 
containing protein, which, negatively, regulates the formation of 
actin stress fibers and cell migration through inhibition of RhoA, 
while it may also have a growth inhibitory effect [68] . Two types of 
KANKl proteins have been reported with distinct tissue 
distribution [69], but they have not been functionally character- 
ized. TSC22D3 (also known as GIL^ is a well-studied glucocor- 
ticoid-regulated gene in the immune system. Four different 
isoforms have been identified and characterized; these are not 
functionally redundant, but rather, they are involved in distinct 
aspects of cellular physiology and modulate distinct signahng 
pathways [70]. The PR-regulated variant (NM_004089) encodes 
for isoform GILZl [70]. GILZl appears to be the most potent 
isoform in stimulation of Na""" transport and repression of NF-kB. 
Few studies have addressed the role of GILZ in cancer. A recent 
study showed that it was expressed in epithelial ovarian cancer, 
where it increased tumor cell proliferation, activated AKT and 
regulated p2 1 and cyclin D 1 expression, events that are associated 
with tumor progression [71]. 

Taken together, the above data strongly suggest that the PR can 
dictate promoter use decisions leading to significant expression 
changes of specific transcript variants. Future studies should aim to 
characterize these variants, examine their potential clinical value 
in hormone responsive breast tumors and, more importantly, 
determine the functional properties of the isoforms they encode. 

Conclusions 

The era of next-generation sequencing has immensely advanced 
our understanding in nuclear hormone receptor signaling [72]. 
Several studies were particularly aimed to the estrogen receptor 
and they offered significant new insights into the complex 
regulatory gene networks controlled by the receptor in breast 
cancer cells [73]. However, this powerful technique has not been 
as extensively used in the study of the PR. A few recent papers 
have performed genome-wide mapping of PR binding sites in 
breast cancer cells [74,75,76] confirming that the receptor binds 
more frequently in intra- and inter- genie regions than in the 
promoters of target genes in agreement with earlier observations 
[25,59]. To complement these studies and shed more light into the 
progestin-regulated gene networks in breast cancer cells, we 
employed 50bp paired-end sequencing to identify early responsive 
transcripts. Genes that display expression changes at early time 



points are more likely to be primary PR targets; however, these 
changes are usually of small magnitude and are often missed by 
microarray studies. Our experimental approach provided the 
necessary depth to detect such changes. We identified 1287 DEGs 
and we extensively validated new targets by RT-qPCR. Our data 
offer a new insight into the multifaceted role of PR in breast cancer 
biology and point to new routes future research can take. For 
example, we find that the PR alters the expression levels of key 
transcription factors and, in this manner, it may be affecting 
important transcriptional networks that govern cell fate. It remains 
to be seen wh(;ther these changes accommodate the needs of the 
cancer cell and corroborate the role of PR in promoting 
tumorigenesis. Of particular importance is the finding that the 
PR regulates a plethora of genes that participate in small-GTPase 
signaling cascades. Integration of the transcriptome and PR- 
cistrome profiling of hormonally treated cells strongly suggests that 
several smaU-GTPases regulators are direct PR targets. It is likely 
that progestin modulation of their expression levels leads to 
deregulation of the respective smaU-GTPases and the processes 
they control and eventually contributes to mammary tumorigen- 
esis. Finally, our data reveal that the PR regulates the expression of 
specific transcript variants, and it, most likely, contributes to a 
more complex proteomic profile of the breast cancer cell. Future 
studies will show whether these specific PR-regulated transcripts 
may have clinical utility in prognosis and/or the development of 
targeted therapies. 

Supporting information 

Figure SI PR-regulated genes involved in cell death/ 
apoptosis. GO annotation and literature search led to the 
functional categorization of geni;s involved in cell death as pro- 

apoptotic or anti-apoptotic. A few genes were denoted as both pro- 

and anti- apoptotic and were counted in both groups. 

(TIF) 

Figure S2 ChlP-sequencing experiments for the PR 
identify receptor binding sites in T47D cells. (A) Cells 
treated with R5020 for 1 hr were used for ChIP experiments with 
an antibody against the PR. Immunoprecipitated DNA was used 
in sequencing experiments and representative data are shown 
here. Four PR binding sites (depicted by black blocks) were found 
in distal enhancer elements of FKBP5 transcript variant 1 
(NM_0()4117), which is the PR-regulated transcript. (B) ChlP- 
qPCR experiments for the PR were performed in progestin- and 
vehicle- treated cells. The primers used amplified part of some of 
the PR binding sites (depicted by red arrows and labeled a-e) 
identified in the FKBP5 locus by ChlP-seq. Error bars indicate the 
SEM. (single asterisk indicates p-value<0.05, double asterisk p- 
value<0.005 and triple asterisk p-value<0.005). (C) As in (B), but 
primers used amplified part of the PR binding sites associated with 
the genes shown. A known PR binding site in the promoter of 
CDKNIA is used as a positive control and the promoter of 
GAPDH as a negative one (p-value<0.001, single asterisk 
indicates p-value<0.05, double asterisk p-value<0.01 and triple 
asterisk p-value< 0.005). 
(TIF) 

Table SI Differentially expressed genes in T47D cells 
after 3 hours of progestin treatment. The log2(fold change) 
of the ratio of EtOH-treated to R5020-treated is given. Genes 
identified by previous gene expression microarrays studies are 
accompanied by reference numbers. 
(XLSX) 



PLOS ONE I www.plosone.org 



11 



June 2014 I Volume 9 | Issue 6 | e98404 



PR-Regulated Transcriptome in Breast Cancer Cells 



File SI Gene Ontology analysis of progestin-regulated 
genes. 

PCLSX) 

File S2 Differentially expressed transcripts (DETs) in 
T47D cells after 3 hours of progestin treatment. The top 

80 DETs (p-value<5xl0"^) are listed in table A_DETs. Thirty of 

them are generated by genes that have multiple transcripts in the 
RefSeq database [64] and they are listed in table B_annotated 
transcripts. 

po^sx) 
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